Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #3587: adding inform support for limit/batch fetching #3753

Merged
merged 1 commit into from
Jan 20, 2022

Conversation

shawkins
Copy link
Contributor

Description

Related to #3587 #3616 - adds limit / batch fetching support to informers. This can cut down on the size of the responses from the server and eventually the memory footprint of an informer. To support the latter this will not fall back to fetching the entire list like the go client - this may need some refinement as currently it will just keep retrying.

If the limit is not specified, the default is to fetch everything in a single list.

A behavioral difference introduced here is that the affect of a relist is no longer atomic on the cache - rather each item in the relist will perform a cache update and create an event as it is seen. Since the cache is still eventually consistent either way, I think this change is not a concern.

The javadocs call out a performance warning - it seems that pagination hits etc.d directly and so there's a concern about performance and scaling.

These changes to conflict of course with the api module creation, but that will be easy to resolve.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@centos-ci
Copy link

Can one of the admins verify this patch?

@shawkins
Copy link
Contributor Author

@metacosm @attilapiros Some additional thoughts - it seems to be a pretty advanced case to consider using the limit option with an informer. There are quite a few comments around the behavior in the go client.

I don't see much in the literature about it being a built-in case, but it seems like you'd also look towards a sharding pattern - an overall/coordinating operator would create something like a statefulset of controllers, where each has responsibility for a subset of the resources - driven by an annotation (something that is filterable by informers/kubernetes) added by the overall operator.

@manusa
Copy link
Member

manusa commented Jan 17, 2022

I don't see much in the literature about it being a built-in case, but it seems like you'd also look towards a sharding pattern - an overall/coordinating operator would create something like a statefulset of controllers, where each has responsibility for a subset of the resources - driven by an annotation (something that is filterable by informers/kubernetes) added by the overall operator.

I can't find now the source, but I recall that operators should be singletons (or work with some sort of leader-election mechanism). (Which is something that disappointment given the nature of K8s and its overall purpose)

Is this new sharding approach something documented?

I understand that this has nothing to do with the scope of the current PR, or does it?

@shawkins
Copy link
Contributor Author

Is this new sharding approach something documented?

No, just a musing on how you would approach it assuming a "singleton" coordinating operator.

I understand that this has nothing to do with the scope of the current PR, or does it?

So far the only request for this limit/batching logic is to work towards scaling operators. If there are other applicable approaches we should determine what is best to ultimately peruse. The next step from here is to control the memory footprint of the cache - from #3636 that either looks like a mapping/pruning function or a store that's not fully in memory. I don't think the latter seems like the right direction.

As for this change it does seem like it can stand alone based upon the go client. There are situations where it is beneficial to form smaller paginated results.

@shawkins
Copy link
Contributor Author

No, just a musing on how you would approach it assuming a "singleton" coordinating operator.

To elaborate further - you'd have the coordinating operator set a hash on the resource as a label in the range of the number of statefulset replicas. The statefulset pods would be setup to provide the pod name via an env property, which means they could parse out the pod index to determine which bucket they were responsible for. The individual informers would then be setup to only look for resources with that index label. Any dependent resources would need to have that label transitively set as well for the sharding pattern to work all the way down.

@manusa
Copy link
Member

manusa commented Jan 17, 2022

To elaborate further [...]

Yes, I figured some mechanism like this. However, I might still see some flaws depending on the use-case (from my undetailed, first-glance perspective). For some other uses, it might not be necessary to have an orchestrating process/controller (i.e. considering a StatefulSet, the Pod's ordinal name might be enough to devise a constant strategy to self-assign the bucket).

Anyway, I guess this is really off topic for the PR.

@shawkins
Copy link
Contributor Author

the Pod's ordinal name might be enough to devise a constant strategy to self-assign the bucket

To self-assign the hash/index label would already need to be present on the resources.

Anyway, I guess this is really off topic for the PR.

Yes, it was just to add some more to the discussion about operator scaling. I'll add a reference to this to the other on-going issue.

@manusa
Copy link
Member

manusa commented Jan 17, 2022

the hash/index label would already need to be present on the resources

Well, in my imaginary example the hash (or input for the resource assignment) could be computed from the resource UID or something similar.

@shawkins
Copy link
Contributor Author

Well, in my imaginary example the hash (or input for the resource assignment) could be computed from the resource UID or something similar.

Something (which I'm referring to as the overall operator) has to compute a hash to the index and apply it as a label to the resource - it needs to be filterable by an informer. I tried to summarize all the extra work this would entail at #3587 (comment)

@manusa
Copy link
Member

manusa commented Jan 18, 2022

Well, in my imaginary example the hash (or input for the resource assignment) could be computed from the resource UID or something similar.

Something (which I'm referring to as the overall operator) has to compute a hash to the index and apply it as a label to the resource - it needs to be filterable by an informer. I tried to summarize all the extra work this would entail at #3587 (comment)

😅 I forgot about the filtering part :)

From your detailed explanations I get the following:

  • The Orchestrating Pod (Overall Operator) is responsible to assign a filterable property (in this case a label) to the resource to be processed. bottleneck
  • 1-N worker Pods with shared informers that filter the assigned resources and perform the actual work. I guess that scaling might impose a challenge in order to rehash the existing resources(I think you expand on this on your comment).

Support for some sort of server-side dynamic labels or wildcard filtering would be of help to completely remove the orchestrating part (kubernetes/kubernetes#30755)

@manusa manusa added this to the 5.12.0 milestone Jan 18, 2022
@sonarcloud
Copy link

sonarcloud bot commented Jan 20, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

62.8% 62.8% Coverage
0.0% 0.0% Duplication

@manusa manusa merged commit 36d6731 into fabric8io:master Jan 20, 2022
@amydentremont
Copy link

Just wanted to say thanks for adding this feature! My team just ran into needing the ListerWatcher to paginate in the Informer's Reflector and we realized the ability was something we just got from upgrading versions 🙌

@shawkins
Copy link
Contributor Author

Just wanted to say thanks for adding this feature!

No problem, hope it works out well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants